MACS 30100
University of Chicago
Two-dimensional hyperplane
\[\beta_0 + \beta_1 X_1 + \beta_2 X_2 = 0\]
\(p\)-dimensional hyperplane
\[\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_p X_p = 0\]
Points on the hyperplane
\[\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_p X_p > 0\]
\[\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_p X_p < 0\]
\[x_1 = \begin{pmatrix} x_{11} \\ \vdots \\ x_{1p} \end{pmatrix}, \dots, x_n = \begin{pmatrix} x_{n1} \\ \vdots \\ x_{np} \end{pmatrix}\]
Separating hyperplane
\[\beta_0 + \beta_1 x_{i1} + \dots + \beta_p x_{ip} > 0, \text{if } y_i = 1\] \[\beta_0 + \beta_1 x_{i1} + \dots + \beta_p x_{ip} < 0, \text{if } y_i = -1\]
\(n\) training observations with predictors \(x_1, \dots, x_n \in \mathbb{R}^p\) and associated class labels \(y_1, \dots, y_n \in \{-1, 1\}\)
\[\begin{aligned} & \underset{\beta_0, \beta_1, \dots, \beta_p}{\text{maximize}} & & M \\ & \text{s.t.} & & \sum_{j=1}^p \beta_j^2 = 1, \\ & & & y_i(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \dots + \beta_p x_{ip}) \geq M \; \forall \; i = 1, \dots, n \\ \end{aligned}\]
\[\begin{aligned} & \underset{\beta_0, \beta_1, \dots, \beta_p, \epsilon_1, \dots, \epsilon_n}{\text{maximize}} & & M \\ & \text{s.t.} & & \sum_{j=1}^p \beta_j^2 = 1, \\ & & & y_i(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \dots + \beta_p x_{ip}) \geq M(1 - \epsilon_i), \\ & & & \epsilon_i \geq 0, \sum_{i = 1}^n \epsilon_i \leq C \\ \end{aligned}\]
\[X_1, X_1^2, X_2, X_2^2, \dots, X_p, X_p^2\]
\[\begin{aligned} & \underset{\beta_0, \beta_{11}, \beta_{12}, \dots, \beta_{p1}, \beta_{p2}, \epsilon_1, \dots, \epsilon_n}{\text{maximize}} & & M \\ & \text{s.t.} & & y_i \left( \beta_0 + \sum_{j = 1}^p \beta_{j1} x_{ij} + \sum_{j = 1}^p \beta_{j2} x_{ij}^2 \right) \geq M(1 - \epsilon_i), \\ & & & \epsilon_i \geq 0, \sum_{i = 1}^n \epsilon_i \leq C, \sum_{j = 1}^p \sum_{k = 1}^2 \beta_{jk}^2 = 1 \\ \end{aligned}\]
Inner product: \(\langle a,b \rangle = \sum_{i = 1}^r a_i b_i\)
## [1] 1 2 3 4 5
## [1] 1 2 3 4 5
## [,1]
## [1,] 55
\[\langle x_i, x_{i'} \rangle = \sum_{j = 1}^p x_{ij} x_{i'j}\]
Linear support vector
\[f(x) = \beta_0 + \sum_{i = 1}^n \alpha_i \langle x, x_i \rangle\]
\[f(x) = \beta_0 + \sum_{i \in \mathbb{S}} \alpha_i \langle x, x_i \rangle\]
Generalization of the inner product
\[K(x_i, x_{i'})\]
Linear kernel
\[K(x_i, x_{i'}) = \sum_{j = 1}^p x_{ij} x_{i'j}\]
\[K(x_i, x_{i'}) = (1 + \sum_{j = 1}^p x_{ij} x_{i'j})^d\]
\[f(x) = \beta_0 + \sum_{i \in \mathbb{S}} \alpha_i K(x,x_i)\]
\[K(x_i, x_{i'}) = \exp(- \gamma \sum_{j=1}^p (x_{ij} - x_{i'j})^2)\]
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 5
##
## - best performance: 0.364
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-03 0.400 0.0686
## 2 1e-02 0.390 0.0779
## 3 1e-01 0.376 0.0729
## 4 1e+00 0.366 0.0760
## 5 5e+00 0.364 0.0735
## 6 1e+01 0.364 0.0735
## 7 1e+02 0.364 0.0735
##
## Call:
## best.tune(method = svm, train.x = Survived ~ Age + Fare, data = as_tibble(titanic_split$train),
## ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100)),
## kernel = "linear")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 5
## gamma: 0.5
##
## Number of Support Vectors: 372
##
## ( 186 186 )
##
##
## Number of Classes: 2
##
## Levels:
## 0 1
## Area under the curve: 0.759
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 0.001
##
## - best performance: 0.394
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-03 0.394 0.0844
## 2 1e-02 0.398 0.0872
## 3 1e-01 0.396 0.0863
## 4 1e+00 0.398 0.0851
## 5 5e+00 0.396 0.0858
## 6 1e+01 0.398 0.0851
## 7 1e+02 0.396 0.0858
##
## Call:
## best.tune(method = svm, train.x = Survived ~ Age + Fare, data = as_tibble(titanic_split$train),
## ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100)),
## kernel = "polynomial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 0.001
## degree: 3
## gamma: 0.5
## coef.0: 0
##
## Number of Support Vectors: 397
##
## ( 198 199 )
##
##
## Number of Classes: 2
##
## Levels:
## 0 1
## Area under the curve: 0.724
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 100
##
## - best performance: 0.326
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-03 0.400 0.0625
## 2 1e-02 0.400 0.0625
## 3 1e-01 0.358 0.0830
## 4 1e+00 0.342 0.0819
## 5 5e+00 0.330 0.0634
## 6 1e+01 0.328 0.0681
## 7 1e+02 0.326 0.0700
##
## Call:
## best.tune(method = svm, train.x = Survived ~ Age + Fare, data = as_tibble(titanic_split$train),
## ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100)),
## kernel = "radial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 100
## gamma: 0.5
##
## Number of Support Vectors: 334
##
## ( 165 169 )
##
##
## Number of Classes: 2
##
## Levels:
## 0 1
## Area under the curve: 0.7
## # A tibble: 1,165 × 8
## vote96 mhealth_sum age educ black female married inc10
## <fctr> <dbl> <dbl> <dbl> <fctr> <fctr> <fctr> <dbl>
## 1 1 0 60 12 0 0 0 4.81
## 2 1 1 36 12 0 0 1 8.83
## 3 0 7 21 13 0 0 0 1.74
## 4 0 6 29 13 0 0 0 10.70
## 5 1 1 41 15 1 1 1 8.83
## 6 1 2 48 20 0 0 1 8.83
## 7 0 9 20 12 0 1 0 7.22
## 8 0 12 27 11 0 1 0 1.20
## 9 1 2 28 16 0 0 1 7.22
## 10 1 0 72 14 0 0 1 4.01
## # ... with 1,155 more rows
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 5
##
## - best performance: 0.294
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-03 0.315 0.0513
## 2 1e-02 0.315 0.0513
## 3 1e-01 0.300 0.0586
## 4 1e+00 0.296 0.0633
## 5 5e+00 0.294 0.0647
## 6 1e+01 0.294 0.0647
## 7 1e+02 0.294 0.0647
##
## Call:
## best.tune(method = svm, train.x = vote96 ~ ., data = as_tibble(mh_split$train),
## ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100)),
## kernel = "linear")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 5
## gamma: 0.125
##
## Number of Support Vectors: 513
##
## ( 258 255 )
##
##
## Number of Classes: 2
##
## Levels:
## 0 1
## Area under the curve: 0.777
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 1
##
## - best performance: 0.284
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-03 0.315 0.0389
## 2 1e-02 0.315 0.0389
## 3 1e-01 0.304 0.0388
## 4 1e+00 0.284 0.0459
## 5 5e+00 0.300 0.0397
## 6 1e+01 0.301 0.0309
## 7 1e+02 0.311 0.0339
##
## Call:
## best.tune(method = svm, train.x = vote96 ~ ., data = as_tibble(mh_split$train),
## ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100)),
## kernel = "polynomial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 1
## degree: 3
## gamma: 0.125
## coef.0: 0
##
## Number of Support Vectors: 506
##
## ( 263 243 )
##
##
## Number of Classes: 2
##
## Levels:
## 0 1
## Area under the curve: 0.73
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 1
##
## - best performance: 0.288
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-03 0.315 0.0535
## 2 1e-02 0.315 0.0535
## 3 1e-01 0.314 0.0509
## 4 1e+00 0.288 0.0524
## 5 5e+00 0.291 0.0561
## 6 1e+01 0.293 0.0568
## 7 1e+02 0.313 0.0537
##
## Call:
## best.tune(method = svm, train.x = vote96 ~ ., data = as_tibble(mh_split$train),
## ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100)),
## kernel = "radial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
## gamma: 0.125
##
## Number of Support Vectors: 510
##
## ( 266 244 )
##
##
## Number of Classes: 2
##
## Levels:
## 0 1
## Area under the curve: 0.749
| SVM kernel | CV training error rate |
|---|---|
| Linear | 0.294 |
| Polynomial | 0.284 |
| Radial | 0.288 |
##
## Call:
## glm(formula = vote96 ~ ., family = binomial, data = as_tibble(mh_split$train))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.508 -1.049 0.534 0.855 1.948
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.00019 0.60162 -6.65 3.0e-11 ***
## mhealth_sum -0.08478 0.02795 -3.03 0.0024 **
## age 0.04332 0.00585 7.41 1.3e-13 ***
## educ 0.20305 0.03439 5.90 3.5e-09 ***
## black1 0.11542 0.23625 0.49 0.6252
## female1 0.20224 0.16607 1.22 0.2233
## married1 0.21052 0.18544 1.14 0.2563
## inc10 0.06051 0.03143 1.93 0.0542 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1016.74 on 815 degrees of freedom
## Residual deviance: 876.12 on 808 degrees of freedom
## AIC: 892.1
##
## Number of Fisher Scoring iterations: 4
## Area under the curve: 0.778
The test error rate for the logistic regression model is 0.261.
## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 816 1000 1 ( 0 1 )
## 2) age < 48.5 516 700 1 ( 0 1 )
## 4) educ < 12.5 191 300 0 ( 1 0 )
## 8) educ < 11.5 55 60 0 ( 1 0 ) *
## 9) educ > 11.5 136 200 1 ( 0 1 ) *
## 5) educ > 12.5 325 400 1 ( 0 1 )
## 10) mhealth_sum < 4.5 259 300 1 ( 0 1 ) *
## 11) mhealth_sum > 4.5 66 90 0 ( 1 0 ) *
## 3) age > 48.5 300 300 1 ( 0 1 )
## 6) educ < 12.5 153 200 1 ( 0 1 )
## 12) inc10 < 1.08335 43 60 1 ( 0 1 ) *
## 13) inc10 > 1.08335 110 100 1 ( 0 1 ) *
## 7) educ > 12.5 147 60 1 ( 0 1 ) *
## Area under the curve: 0.576
The test error rate for the decision tree model is 0.315.
##
## Call:
## randomForest(formula = vote96 ~ ., data = as_tibble(mh_split$train), mtry = 7)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 7
##
## OOB estimate of error rate: 32.4%
## Confusion matrix:
## 0 1 class.error
## 0 89 168 0.654
## 1 96 463 0.172
## Area under the curve: 0.714
##
## Call:
## randomForest(formula = vote96 ~ ., data = as_tibble(mh_split$train))
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 30.4%
## Confusion matrix:
## 0 1 class.error
## 0 83 174 0.677
## 1 74 485 0.132
## Area under the curve: 0.743